<!DOCTYPE html>
<html lang="en">

<!-- Head tag -->
<head><meta name="generator" content="Hexo 3.9.0">

    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">

    <!--Description-->
    
        <meta name="description" content="———scrapyd部署爬虫—————1.编写爬虫2.部署环境pip install scrapydpip install scrapyd-client启动scrapyd的服务：cmd:&amp;gt;scrapyd（必须处于开启状态）在爬虫根目录执行：scrapyd-deploy,如果提示不是内部命令，需">
    

    <!--Author-->
    
        <meta name="author" content="ck">
    

    <!--Open Graph Title-->
    
        <meta property="og:title" content="爬虫部署---scrapyd部署爬虫+Gerapy管理界面scrapyd+gerapy部署流程">
    

    <!--Open Graph Description-->
    

    <!--Open Graph Site Name-->
    <meta property="og:site_name" content="CK">

    <!--Type page-->
    
        <meta property="og:type" content="article">
    

    <!--Page Cover-->
    

    <meta name="twitter:card" content="summary">
    

    <!-- Title -->
    
    <title>爬虫部署---scrapyd部署爬虫+Gerapy管理界面scrapyd+gerapy部署流程 - CK</title>

    <!-- Bootstrap Core CSS -->
    <link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.2/css/bootstrap.min.css" integrity="sha384-y3tfxAZXuh4HwSYylfB+J125MxIs6mR5FOHamPBG064zB+AFeWH94NdvaCBm8qnd" crossorigin="anonymous">

    <!-- Custom Fonts -->
    <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.6.3/css/font-awesome.min.css" rel="stylesheet" type="text/css">

    <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
    <!--[if lt IE 9]>
        <script src="//oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script>
        <script src="//oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
    <![endif]-->

    <!-- Gallery -->
    <link href="//cdnjs.cloudflare.com/ajax/libs/featherlight/1.3.5/featherlight.min.css" type="text/css" rel="stylesheet">

    <!-- Custom CSS -->
    <link rel="stylesheet" href="/blog/css/style.css">

    <!-- Google Analytics -->
    


</head>


<body>

<div class="bg-gradient"></div>
<div class="bg-pattern"></div>

<!-- Menu -->
<!--Menu Links and Overlay-->
<div class="menu-bg">
    <div class="menu-container">
        <ul>
            
            <li class="menu-item">
                <a href="/blog/">
                    Home
                </a>
            </li>
            
            <li class="menu-item">
                <a href="/blog/archives">
                    Archives
                </a>
            </li>
            
            <li class="menu-item">
                <a href="/blog/about.html">
                    About
                </a>
            </li>
            
            <li class="menu-item">
                <a href="/blog/tags">
                    Tags
                </a>
            </li>
            
            <li class="menu-item">
                <a href="/blog/categories">
                    Categories
                </a>
            </li>
            
            <li class="menu-item">
                <a href="/blog/contact.html">
                    Contact
                </a>
            </li>
            
        </ul>
    </div>
</div>

<!--Hamburger Icon-->
<nav>
    <a href="#menu"></a>
</nav>

<div class="container">

    <!-- Main Content -->
    <div class="row">
    <div class="col-sm-12">

        <!--Title and Logo-->
        <header>
    <div class="logo">
        <a href="/blog/"><i class="logo-icon fa fa-cube" aria-hidden="true"></i></a>
        
    </div>
</header>

        <section class="main">
            
<div class="post">

    <div class="post-header">
        <h1 class="title">
            <a href="/blog/2019/08/12/爬虫部署-scrapyd部署爬虫-Gerapy管理界面scrapyd-gerapy部署流程/">
                爬虫部署---scrapyd部署爬虫+Gerapy管理界面scrapyd+gerapy部署流程
            </a>
        </h1>
        <div class="post-info">
            
                <span class="date">2019-08-12</span>
            
            
            
        </div>
    </div>

    <div class="content">

        <!-- Gallery -->
        

        <!-- Post Content -->
        <p>———scrapyd部署爬虫—————<br>1.编写爬虫<br>2.部署环境<br>pip install scrapyd<br>pip install scrapyd-client<br>启动scrapyd的服务：cmd:&gt;scrapyd（必须处于开启状态）<br>在爬虫根目录执行：scrapyd-deploy,如果提示不是内部命令，需要修改配置文件。</p>
<p>3.发布工程到scrapyd<br>修改scrapy.cfg，去掉url前的#<br>进入到scrapy项目根目录，执行：scrapyd-deploy <target> -p <projectname>提示：（target:scrapy.cfg中[deploy:***]）（projectname：scrapy.cfg中project = XXX）<br>（#url中的#去掉）<br>4.启动爬虫<br>第一种方法：Django中view.py<br>class StartSpider(View):<br>def get(self,request):<br>url = ‘<a href="http://localhost:6800/schedule.json&#39;" target="_blank" rel="noopener">http://localhost:6800/schedule.json&#39;</a><br>data = {‘project’: ‘ScrapyAbckg’, ‘spider’: ‘abckg’}<br>print( requests.post(url=url, data=data))<br>return JsonResponse({‘result’:’OK’})<br>第二种方法：（命令式启动爬虫：curl <a href="http://localhost:6800/schedule.json" target="_blank" rel="noopener">http://localhost:6800/schedule.json</a> -d project=项目名 -d spider=爬虫名）</projectname></target></p>
<p>5.启动django<br>cmd：python manage.py runserver</p>
<p>—————-scrapyd 管理爬虫接口———————-<br>1、获取状态</p>
<p><a href="http://127.0.0.1:6800/daemonstatus.json" target="_blank" rel="noopener">http://127.0.0.1:6800/daemonstatus.json</a></p>
<p>2、获取项目列表</p>
<p><a href="http://127.0.0.1:6800/listprojects.json" target="_blank" rel="noopener">http://127.0.0.1:6800/listprojects.json</a></p>
<p>3、获取项目下已发布的爬虫列表</p>
<p><a href="http://127.0.0.1:6800/listspiders.json?project=myproject" target="_blank" rel="noopener">http://127.0.0.1:6800/listspiders.json?project=myproject</a></p>
<p>4、获取项目下已发布的爬虫版本列表<br><a href="http://127.0.0.1:6800/listversions.json?project=myproject" target="_blank" rel="noopener">http://127.0.0.1:6800/listversions.json?project=myproject</a></p>
<p>5、获取爬虫运行状态</p>
<p><a href="http://127.0.0.1:6800/listjobs.json?project=myproject" target="_blank" rel="noopener">http://127.0.0.1:6800/listjobs.json?project=myproject</a></p>
<p>6、启动服务器上某一爬虫（必须是已发布到服务器的爬虫)<br><a href="http://localhost:6800/schedule.json" target="_blank" rel="noopener">http://localhost:6800/schedule.json</a><br>(post方式，data={“project”:myproject,”spider”:myspider}）</p>
<p>7、删除某一版本爬虫</p>
<p><a href="http://127.0.0.1:6800/delversion.json" target="_blank" rel="noopener">http://127.0.0.1:6800/delversion.json</a><br>(post方式，data={“project”:myproject,”version”:myversion}）</p>
<p>8、删除某一工程，包括该工程下的各版本爬虫<br>(运行中爬虫无法删除)<br><a href="http://127.0.0.1:6800/delproject.json" target="_blank" rel="noopener">http://127.0.0.1:6800/delproject.json</a><br>(post方式，data={“project”:myproject}）</p>
<p>9.取消运行中的爬虫<br><a href="http://127.0.0.1:6800/cancel.json" target="_blank" rel="noopener">http://127.0.0.1:6800/cancel.json</a><br>(post方式，data={“project”:myproject,”job”:jobid}）</p>
<p>————–django+scrapy—————————–<br>1.创建django项目，并编写models.py,启动django项目</p>
<p>2.Django项目根目录下创建Scrapy项目<br>（这是scrapy-djangoitem所需要的配置）<br>配置Django嵌入，在Scrapy的settings.py中加入以下代码：<br>import os<br>import sys<br>sys.path.append(os.path.dirname(os.path.abspath(‘.’)))<br>os.environ[‘DJANGO_SETTINGS_MODULE’] = ‘django项目名.settings’<br>import django<br>django.setup()</p>
<p>3.编写爬虫<br>4.item.py中引入Django模型类（pip install scrapy-djangoitem）<br>from scrapy_djangoitem import DjangoItem<br>from 子应用 import models<br>class ScrapyabckgItem(DjangoItem):<br># 此处必须起名为django_model,主爬虫中使用item[‘title’]=xxx<br>django_model = models.AbckgModel</p>
<p>5.pipelines.py中调用save()<br>class ScrapyabckgPipeline(object):<br>def process_item(self, item, spider):<br># 插入到数据库<br>item.save()<br>return item #将item传给下一个管道继续处理</p>
<p>6.启动爬虫(用命令，并非scrapyd启动)：scrapy crawl abckg<br>7.刷新django-admin后台</p>
<p>—————Gerapy 管理界面————–</p>
<p>安装：</p>
<p>为什么部署呢？<br>部署会让项目更稳定，更高效，增大访问量</p>
<p>scrapyd部署 和 gerapy部署 的区别<br>相比scrapyd部署的只能查看状态，页面简陋 gerapy更人性化，页面简洁，可操作性强</p>
<p>总结流程：</p>
<p>1启动scrapyd：</p>
<p>　　命令：scrapyd</p>
<p>2启动gerapy：</p>
<p>　　命令：gerapy runserver</p>
<p>3部署爬虫scrapy 到 scrapyd，在爬虫根目录发布爬虫：</p>
<p>　　命令： scrapyd-deploy <target> -p 项目名称</target></p>
<p>4访问gerapy:</p>
<p>　　地址：127.0.0.1:8000</p>
<p>5在gerapy页面 ：</p>
<p>　　主机管理 –》创建 –》连接scrapyd</p>
<p>6在gerapy页面 ：</p>
<p>　　点击调度 —–》 会展示出所有的已发布爬虫</p>
<p>7随意对这些爬虫进行 监管，运行</p>
<p>ok！ 搞定啦！CentOs 部署爬虫在 linux 下同理，只是ip不同罢了 就这！</p>

    </div>

    

    

    <!-- Comments -->
    

</div>
        </section>

    </div>
</div>


</div>

<!-- Footer -->
<div class="push"></div>

<footer class="footer-content">
    <div class="container">
        <div class="row">
            <div class="col-xs-12 col-sm-12 col-md-6 col-lg-6 footer-about">
                <h2>About</h2>
                <p>
                    This theme was developed by <a href="https://github.com/klugjo">Jonathan Klughertz</a>. The source code is available on Github. Create Websites. Make Magic.
                </p>
            </div>
            
    <div class="col-xs-6 col-sm-6 col-md-3 col-lg-3 recent-posts">
        <h2>Recent Posts</h2>
        <ul>
            
            <li>
                <a class="footer-post" href="/blog/2019/08/12/如何快速搭建hexo技术博客/">如何快速搭建hexo技术博客</a>
            </li>
            
            <li>
                <a class="footer-post" href="/blog/2019/08/12/二进制字典数据处理/">二进制字典数据处理</a>
            </li>
            
            <li>
                <a class="footer-post" href="/blog/2019/08/12/Dos-命令手记/">Dos-命令手记</a>
            </li>
            
            <li>
                <a class="footer-post" href="/blog/2019/08/12/Mysql-命令手记/">Mysql-命令手记</a>
            </li>
            
        </ul>
    </div>



            
        </div>
        <div class="row">
            <div class="col-xs-12 col-sm-12 col-md-12 col-lg-12">
                <ul class="list-inline footer-social-icons">
                    
                    <li class="list-inline-item">
                        <a href="https://github.com/klugjo/hexo-theme-alpha-dust">
                            <span class="footer-icon-container">
                                <i class="fa fa-github"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://twitter.com/?lang=en">
                            <span class="footer-icon-container">
                                <i class="fa fa-twitter"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://www.facebook.com/">
                            <span class="footer-icon-container">
                                <i class="fa fa-facebook"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://www.instagram.com/">
                            <span class="footer-icon-container">
                                <i class="fa fa-instagram"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://dribbble.com/">
                            <span class="footer-icon-container">
                                <i class="fa fa-dribbble"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://plus.google.com/">
                            <span class="footer-icon-container">
                                <i class="fa fa-google-plus"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://www.behance.net/">
                            <span class="footer-icon-container">
                                <i class="fa fa-behance"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="https://500px.com/">
                            <span class="footer-icon-container">
                                <i class="fa fa-500px"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="mailto:test@example.com">
                            <span class="footer-icon-container">
                                <i class="fa fa-envelope-o"></i>
                            </span>
                        </a>
                    </li>
                    
                    
                    <li class="list-inline-item">
                        <a href="\#">
                            <span class="footer-icon-container">
                                <i class="fa fa-rss"></i>
                            </span>
                        </a>
                    </li>
                    
                </ul>
            </div>
        </div>
        <div class="row">
            <div class="col-xs-12 col-sm-12 col-md-12 col-lg-12">
                <div class="footer-copyright">
                    @Untitled. All right reserved | Design & Hexo <a href="http://www.codeblocq.com/">Jonathan Klughertz</a>
                </div>
            </div>
        </div>
    </div>
</footer>

<!-- After footer scripts -->

<!-- jQuery -->
<script src="//code.jquery.com/jquery-2.1.4.min.js"></script>

<!-- Tween Max -->
<script src="//cdnjs.cloudflare.com/ajax/libs/gsap/1.18.5/TweenMax.min.js"></script>

<!-- Gallery -->
<script src="//cdnjs.cloudflare.com/ajax/libs/featherlight/1.3.5/featherlight.min.js" type="text/javascript" charset="utf-8"></script>

<!-- Custom JavaScript -->
<script src="/blog/js/main.js"></script>

<!-- Disqus Comments -->



</body>

</html>